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ABSTRACT 



A new and unified approach to test equating is described that is based on 
log-linear models for smoothing score distributions and on the kernel method of 
non- p a r ame t r ic density estimation. The new method contains both linear and 
standard equ ipe rcent i le methods as special cases and can handle several impor- 
tant equating data collection designs. An example is used to illustrate the new 
method for the random groups and external anchor-test designs. 
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1. INTRODUCTION 

This paDer introduces a new and unified approach to test equating based on 
a flexible family of equating functions that contains both the linear and the 
equipercent ile equating functions as special cases. The new method grows out of 
the perspective on obse rved-scc re test equating described in Braun and Holland 
(1982). We call the new approach the "kernel method of equating tests" because 
of its close connection to the well-studied methods of non-parametr ic density 
estimation using a gaussian kernel, Tapia and Thompson (19/ p ). The kernel 
method may be viewed as generalizing certain features of the equ ipe rc ent i le 
method described by Angoff (1984). Because of this we first review the equiper- 
centile method from our perspective; this also allows us to introduce our nota- 
tional scheme. 

Review of equ ipe rc ent i 1 e equating 

Suppose we have two tests, denoted by X and Y, and let the possible raw- 
score values fcr X and Y be denoted by xi,...,xj and respectively. 

In this notation, J and K are the number of possible raw-score values and not 
the number of test items on X and Y. In the applications that concern us, 

will denote consecutive integers; similarly for y yj(. If, for 

example, X is a number-right scored test, then xy = 0, X 2 = 1,... and xj = the 
number of items in test X. Alternatively, for a rounded formula-scored X, X] is 
negative but xj still denotes the number of items in X. 

As Braun and Holland (1982) emphasize, observed-score test equating always 
takes place on a specific population of examinees. We suppose that this popula- 
tion is fixed and let rj and s^. denote the score probabilities for this popula- 
tion , i . e . , 
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rj = Probf X = Xj } , s k = Prob^Y = y k } . (D 

In (1), we abuse notation slightly and let X denote both a test and the score 
of a randomly ^elected examinee on this test. (Similarly for Y). The score 
probabilities, { r j } and { s k } , are population parameters and depend on the 
underlying population of examinees. They must be estimated from the data 
collected in the equating experiment. We defer a serious discussion of how they 
might be estimated to section 3 and merely suppose that estimates, {rj} and 
{s^} , are available . 

Associated with the score probabilities are the cumulative distribution 
functions (cdfs) of the test scores for X and Y that are defined by 

F(x) « Prob(X < x) « S rj , (2) 

j 

Xj < X 

and 

G(y) = Prob( Y < y) = 2 s k . (3) 

k 

Yk - y 

In (2), x denotes any real number and the summation is over all j for which Xj 
does not ex '.eed x. In (3), y denotes any real number and the sum is over all k 
for which y k does not exceed y. The cdfs, F and G, defined in (2) and (3) are 
step functions with jumps at the possible values for X and Y, respectively. 

If F and G were continuous cdfs (as is, for example, the cdf for the nor- 
mal distribution) then the equipercent ile equating function for equating X to Y 
would have the form 

e Y (x) = G - 1 ( F( x) ) (4) 

and for equating Y to X it would have the form 

e x (y) - (G( y) ) (5) 
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where F" 1 and G“* denote the inverse functions of F and G defined by 

x = F"^(p) if and only if p = F(x) 
y - G“^(p) if and only if p = G(y). 



a nd 



See Braun and Holland (1982) for more discussions of this description of 
equipercentile equating. 

If F and G were continuous, the function e Y (x) and e^(y) defined in (4) and 
(5) would exactly match the distribution of e^(Y) to that of X and the distribu- 
tion of ey(X) to that of Y. However, in practice F and G are discrete so that, 
strictly speaking, F ~ ^ and G ~ ^ do not exist and hence e Y and e ^ cannot be 
defined as in (4) and (5). This fact is usually glossed over in discussions of 
equipercentile equating (e.g. Angoff, 1984; Lord, 1950). Instead, F and G are 
approximated by linear interpolation to obtain percentile ranks. It is instinc- 
tive to see exactly how this linear interpolation is derived mathematically, and 
we now do this. 

The percentile rank of a score x^ is defined as the proportion of examinees 
in the population scoring below x^ plus one -ha If of the proportion scoring 
exactly (Angoff, 1984). How can such a definition be justified? Hern is on- 
approach to justifying it. 

Supoose U is a random variable with a uniform distribution on (- l .t, s 4), and 
suppose that U is independent of the discrete random variable X where 

r j — Prob{ X — x j } i j = 1 , • • • , • 

The edf ^f U is given by 

1 if u > K, 



P rob{ U < u} = 



0 if u < - 'h, 

u +■ 'h if - < U S 'k . 



(6 ) 
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Now consider a new random variable X* defined by 

X* = X + U . (7) 

The new variable X* has a continuous distribution that is spread over the inter- 
val xj - 14 to Xj + 14 . The cdf of X* is found as follows. 

F*(x) = Prob^X* < x} = Prob{x + U < x} 

= 2 Probfx + U < x | X = xj }Prob{x = Xj } = 2 Prob^U < x - xj | X = Xj } r j 

j J 

= 2 Probfu < x - X; J r; . 
j 

But from (6) it follows that 





’l 


if 


X > X.j + 14 , 




Prob{lJ < x - xj } = 


0 


if 


X < Xj - l, 


(8) 




X - Xj + Vi 


if 


Xj - 14 < x < x j + 14 , 





and hence we have 

F*(x) = X r; + (x - Xj + 14)^, for - 14 < x < x; + 14, (9) 

j 

Xj £ x-K 

where the summation in (9) is over all j for which Xj does not exceed x - 14. 

Now evaluate F*(x) at x^ and we have 

F*(xj.) = 2 Tj + l h , (10) 

j 

Xj < Xi 

which is the probability of scoring below x^ plus one half the probability of 
scoring exactly x^ and this is the definition of percentile ranks given above. 
This shows that the percentile rank of Xj_ is simply the value of the cdf F* at 
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x i( i.e. F*^). We may view F* as a continuous approximation to the step- 
function F. From (9) we see that F* is a piecewise iinear function that starts 
at zero at xj - 'h and (if the Xj are consecutive intergers and rj > 0) steadily 
increases to the value of 1 at xj + '/j . 

The standard version of equipercent ile equating can be viewed as replacing 
F by F* and G by a corresponding G*. When { r j >0 } and {s k >o}, the inverse func- 
tions, F" 1 and G" 1 both exist and the functions in (11) are well-defined, i.e., 

ey(x) = G“1 (F^ix) ) 

and UD 

ex(y> = • 

By definition, ey and ey given in (11) are the population eq-iipercent i le 
equating functions for equating X and Y. Sample estimates of ey and ey in (11) 

A A 

are defined by substituting in r j for rj and for in the definitions of 
and G , :.e. (9). (In addition, in practice a pos t -smoot h i ng step may be 

introduced to make the final equating functions even smoother than the piecewise 

linear functions in (II), Angoff (1984), Fairbank (1985), Kolen (1984), Kolen 
and Jar jour a ( 1987 )). 

There are various problems with this version of equipe rcent i 1 e equating. 

For one, consider the mean and variance of X and its "continuous approximation", 
X*. We have 

K ( X* ) = E ( X + U) = E (X ) + 0 = E ( X ) 

but 

VarfX*) = Var(X + U) = Var(X) + Var(U) . 

It is we 1- known that Yar(li) ^ 1/12 so that X and X* have the same means but 

different variances. The l'.ig her moments of X* also fail to agree with those rf 



X. Hence, what the traditional version of equipercent ile equating actually 
is to exactly match the distribution of the two continuous random variables 
and Y* rather than to match the discrete distributions of X and Y. No morr.en 
beyond the first can be expected to be exactly matched using the standard 
equipercent ile equating function although they may be close enough for p-ac" 
work. In addition, because F* and G* place no probability outside the inter 
(xj h, xj + \ h ) and (y^ - 'h , y^ + -A), it is automatically true that ey(x') ,1 
ey(y) defined in (11) map the end-points of these two intervals onto each 
This is often an undesirable property in test equating since it usually i..r: 
the highest (and lowest ) score on X to be mapped onto the highest (and i ■ w>- 
score on Y. Jf X were much easier than Y this property is unreascnaoi e and 
due solely to the arbitrary use of and G* to form the equating f ur..: t i . 

These problems with the traditional form of equ i pe rcent i 1 o equating ; ; 
stem from the arbitrary form assumed for U, i.e. that it be uniform on 
The crux of the kernel method is to replace U with a more flexible choice of 
random variable. In particular, the point of view taken here is that the t- 
tional equipercent ile method is a version of the kernel method using a fixed 
bandwidth" (i.e. the variance of continuous random '.-3, la'dc added to X!. 
general, it is always better to use bandwidths that can vary ir, useful w.i 
kernel methods are employed. 



7 



2. THE KERNEL METHOD OF EQUATING 

Our approach is to accept the fact that X and Y are discrete and hence that 
F and G must be approximated, in some sense, by continuous cdfs before (4) and 
(5) can become well-defined (as they are in (11)). Picking up on the ideas in 
section 1, suppose we now consider the distribution of the random variable, 

X( h x ) , defined by 

x(h x ) = a x (X + h x V) + (l-a x ) y x (1-. 

where X is the discrete random variable that appeared in section 1 and V is a 
random variable that is independent of X and has a standard normal, N(0,1), 
distribution. Also, in (12) y^ an d a x are defined by: 



= E(X) = 2 xj r j , 

j 



a 



2 

X 



2 







(13) 



(14) 



0* = Var(X) = I (xj - y x ) 2 rj . (15) 

j 

The bandwidth, h , is a non-negative constant that we are free to select to 
achieve some useful purpose. What we have done in (12) is replaced U in (7) by 
h^V and then rescaled the sum of X and hyV to preserve the mean and variance of 
X, i.e. it is easy to show that 

E(x(h i) = E(x) = y x 

and 

Var (X( h ) ) = Var(X) = 0* . 

A A 
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for any choice of h >0. Observe that X(0) is identical to X and X( w ) is a 

X 

normal random variable with the same mean and variance as X. When h^ > 0, X(h^) 

has a continuous distribution with cdf 

Fft (x) = Prob{x(h ) < x\ . (16) 

X x 

We will regard {Fu (x), for h > o], as a family of continuous approximations to 

X X 

the discrete cdf F(x). Hence, instead of the single X of section 1, we may con- 
sider the entire collection of approximations, {x(h^), h > o}. 

Observe that Var(h V) - h* , whereas in section 1, Var(U) = 1/12. Hence 

A A 



= 1//I2 = . 289 ~ .3 (17.) 

corresponds roughly to the traditional form of the continuous approximation to F 

used in equipercentile equating, i.e. F*(x) in (9). 

A nice feature of Fv-, (x) is that it has a reasonably tractable analytic 

X 

form. This is given in theorem 1, below. 

Theorem 1 : If X( h^ ) _is_ def ined by (12) and ( x ) Ls the cdf ir. (16) then 



F h (x) = 2 rj ® ( R j x ( x ) ) 03) 

X j 

where $ ( * ) denote t he standard norma 1 cdf and Rj^(x) is the linea r func t io n o£ x 
given bv 



V*’ . 



a x x j - (| - a x >l, x 




(19) 



In (19), and are defined a_sj_n (13) - (If). 




u 



o 



P roof : 

v° 



= Prob{x(h x ) < x} = 
= Probfa h V < x - 

A A 

= 2 Probfa^h^ V < x 

j 

= 2 Probfv < 

j 



Prob[a x (X + h x V) + d-a x )V x * x} 

a x x - (1 " a x^x J 

- a x Xj -( 1 ~ a x )y x l x = x jl r j 



x i - ( 1 " a x )y x , 

a h ^-i 

X X 



r i ®(R. v (x)) . 

J jX 



QED. 



Because the mean and variance of X(h ) exactly match those of the original 

A 

discrete random variable X, it is of interest to know how the higher moments of 
X(h ) differ from those of X. It is, however, the cumulants of X(h x ) rather 
than its moments that, have the simplest relationship to those of X. The j*Jl 
cumuiant of a distribution is the coefficient of ( t ) J / j ! in the Taylor expansion 
(about zero) of the natural iogrithm of its moment generating function, M(t). 

It is well-known that the first and second cumulants are the mean and variance, 
respectively, of the distribution. Furthermore, the third and higher cumulants 
of any normal distribution are all zero. See Kendall and Stuart (1958) for a 
thorough discussion of cumulants. 

Theorem 2 shows the relationship between the cumulants of X(hy) and those of X. 



Theorem 2: Jf_ k j ( h ) d enotes the jU2 cumuiant of Xth^), 

j Ul cumu 1 ant of X, t lien for j > 3 we have 



a nd 



k . denotes t he 
jX 



k j<v • ( V J k jx 



( 20 > 



w here 



The 



i _s def ined in (14). 
prfvf of Theorem 2 is 



g ivcn 



in the appendix. 
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We may interpret Theorem 2 by saying that the higher cumulants of X(h\0 
are all smaller in absolute size (i.e. mere iike those of the normal distribu- 
tion) than the corresponding cumulants of the original distribution of X. 

This is because 



( a x)j < 1 if hx > 0. (21) 

The kernel method of equating is now easy to describe. First of all, con 
tinuous approx ima t ions to F and G are found via (18), i.e., 

F h)( (x) , and G hy (y) , < 22 ) 

and then th° equating functions ey(x) and ey(y) are defined by 



ey( x) 



G“, J (F 



h Y h x 



(x>) 



( 23 ) 



and 

e x(y) = F '^V y)) • (24) 

Note that (23) and (24) define f ami lies of equating functions indexed by hy an 



In (23) and (24) the inverse functions F. ^ and G ^ are defined by 

h X h Y 



x = F 1 (p) if and onlv if p ~ F (x) and 
h X h X 



y = G * (p) if and only if p> = G (y). 
h Y h Y 



In practice, these inverse functions do not have an explicit form but they 



bo easily computed by interpolation. 
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In (22) the bandwidths, hy and hy, are called the cont inu iza t ion constants . 
When they are both chosen to be .3 the resulting equating functions, (23) and 
(24), agree closely with the traditional equ ipe rcent i 1 e equating functions, as 
noted in (17). When hy ana hy are both large, the equating functions closely 
appoximate the standard linear equating function as we demonstrate in the next 
t heorem . 

Theorem 3 : I_f e y ( x ) is d e f ined b_y (23) then 

°Y 

1 im e y ( x ) = U v + — (x - JJ ) = Lin Y (x) . 

hv.hy •*- Y °X 



Proof: It is obvious that as hy and hy -* JO , F (x) and G (y) approach these nor- 



mal c d f 1 s : 



and 



h X 

x - y x 

F t (x) -» $( — - — -) , 
h X ° Y 



h Y 



y - M, 



G (v) -*• $( — — 
hy ' 0, 



G’^(p) y f o, r ® ^(p) , 

h Y ' 



lore $> L (p) is the inverse of the standard normal cdf, therefore 



e Y ( x ) -» + 0 $ _1 ( $( 



x - 



-M 



x - * J x, 



- + o.. c — — 

r Y * 0 



QED 
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We now point out that the objections to equi percent i i e equating nsei. . 



at the end of section 1 do not apply to the kernel method of equating. 

varying the c 1*0 ice of the cent inu i za t i on constants, h., and h, , , we mav a h 

a Y 

wide v a i i e t y of e qua t i ng f u n c t ions t ha t re "in bet we e n" the t i a d j i i o ri a 
and equ i pe i cent i 1 e functions. All of these equating functions oxi.t.y :• . 

o a n s a n d v a r i a n c e s n f e y i V ) a n d X and of e y ( X ' and Y . Fur : h r :.*;o r e , 
on i he choice of cont ir.uicnt i-^n constants the equating fun.ti*. r. ■ :::vu 
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t. he top a 
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Phase II: The cent inuizat ion step. In this step, h and h are chosen to 

_ 3 “ ' ” J ~ A i 

A A A A 

determine continuous approximations, F. (x) and G, ( y ) , to F(x) and G(y). 

h X h Y 

A A A A 

(F(x) and G(y) are obtained by substituting rj for r j in (2) and for sp- 
in (3)). The approximating edf's have the form 



A A A 

- A x " ay X; - ( 1 - a v ) ]-l y 

Fu.(x) -Sr, «( — ) 



( 2 ‘> ) 



h X a X 



and 



G j , t r { v ) 

11 r 



x y " a Y Vk " ^ 1-ay )y Y 
4 ( 1 7 ) 

h y ay 



( 26 ) 



In t25) and (26), the estimated quantities, ay, ay, ]Jy, ]Jy, are all found by 
substitut ing r j for rj and sy for s^ in (13) - (15). It should be emphasized 
that cont inuizat ; o n is not a statistical procedure so that "optimal" choices of 
hy and hy cannot ne based on optimizing statistical properties such as the esti- 
mation of the { r j j or { s p ] . Rather, in cont inu izat ion we are attempting to 
d e c i a e which continuous rdf , Kp T r t x ) , is “closest" in s orn.e appropriate sense to 

A 

F(x). The naive choice of hy - 0 makes Fj. . (x) = F(x), but we are then no longer 

A 

dealing with c ont inuoms edf's and the whole purpose of cont inu i zat ion (i.e. t ; 
get unique invars*-* t’r.ir'i nuis ) :s defeated. In section 4 we discuss some met hoc ^ 



for di 


oo 


i n g 1 y a n d 1 * 


Plia se 


1 1 ' 


: The equar 


lions 


a re ■ 


•o;:ijJuteri 'M 


and 







gUI. (x> i 
t‘Y >'-x 



(*v(y) = f. Ug, ( y )) . 

h x 



( 2t5 ) 
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Once phases I and II are completed, phase III is straight-forward. However 
because it is in this phase that the data on tests X and Y are finally combined 
we identify it as a separate phase. In phase III, we also include the com- 
putation of the standard error of equating (the SEE) that measures the accuracy 

A A 

associated with ey and Gy. In a companion paper to this one (Holland, King and 
Thayer, 1988) we give the details of a computation of the SEE that is based on 
the estimated standard errors for rj and s^ that are available if these estima- 
ted score probabilities are obtained in a particular way using the log-linear 
models described in Holland and Thayer (1987). 
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3. THE ESTIMATION STEP 

The population score probabilities, and defined in (1), must be 

estimated from the data collected in the type of equating experiment that is 
available to the analyst. /ngoff (1984) describes a variety of these experi- 
ments. In this section we shall be concerned with two major classes of such 
experiments -- the random groups designs (Angoff's Designs I and II) and the 
common item or anchor-test designs (Angoff's Designs III and IV). Each class of 
design is considered in a separate subsection. 

The estimation of the score probabilities is a purely statistical problem 
in the sense that the {rj} and the [s k | are well-defined parameters and hence 
estimates of these quantities, say { r j } and [s k |, should have desirable sta- 
tistical properties. Some authors, e.g. Fairbank (1985), refer to the estima- 
tion step as "pre-smoothing”. While it is true that the estimates, { r j ) and {s k }, 
ought to exhibit appropriate degrees of smoothness, this can be achieved in 
various ways. There are at least four statistical properties that might be con- 
sidered in the choice of the estimated score probabilities. These are listed 
below. 

A A 

Consistenc y: As sample sizes increase, the estimates rj and s k ought to con- 

verge, in an appropriate sense, to the population values, rj and s k . 

Positivity: For each possible score value, xj and y k , the estimated score pro- 

A A , 

babilities, rj and s k , ought to be positive. For most tests, estimating a score 
probability to be zero is unreasonable. 



o 
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Stability : Given the sample sizes involved, che deviations of rj from rj and s^ 

from s^ ought to be as small as possible. Of course these deviations always 
involve a random element, and the problem is to keep it to a minimum in an 
appropriate average sense. 

Integrity: When possible (as, for example, in the random groups design) the 

integrity of the sample mean, variance, and possibly other sample moments ought 
to be preserved in the estimated score distributions, { r j } and {s^}. This 
means, for example, that Z xj rj and the sample mean for X are epual and that the 

2 A 

sample second moment for X and Z xjrj are equal as well. 

j 

The approach to the estimation step that we favor is to fit a sequence of 
parametric models to the data and to make appropriate diagnosis of these fitted 
models until one is found that describes the data well with as few parameters as 
possible. The log-linear models Ascribed in Rosenbaum and Thayer (1987) and in 
Holland and Thayer (1987) are especially useful in this regard. These models 
are all well-behaved because they are exponential families of discrete distribu- 
tion and may be estimated by maximum likelihood using standard iterative tech- 
niques. Because these models are exponential families, maximum likelihood 
estimation forces the equality of some sample and estimated moments. Our 
experience is that with 3-6 parameters these models can adequately describe a 
wide variety of univariate score distributions. Bivariate distributions, useful 
for anchor-test equating designs, are also easily estimated using the class of 
log-linear models. Finally, these models automatically satisfy the positivity 
and integrity conditions listed above. Careful data analysis using these models 
also leads to the consistency and stability conditions being satisfied as well. 




2 



17 



3 . 1 Random Groups Equating Designs 

In Angof f * s Design I class of equating experiments, two independent random 
samples are drawn from a common population, P, and test X is administered to one 
sample while test Y is administered to the other. Angoff's Design II is similar 

except that after each sample has been tested with either X or Y, they also 

take the other test as well — i.e. the two groups take both tests but in 
cou nt e r- ba 1 a nc e d order. We will ignore the data pooling problem that arises in 
Design II and merely mention the close connection of this case to Design I to 
which we now devote our attention. 

The raw data that results from the two random samples in Design I may be 

summarized as two sets of frequencies, i.e. the X-f requenc ies , 

nj = number of examinees with X = xj , 

and the Y-f requenc i es , 

= number of examinees with Y = y^. 

The two sample sizes are given by 

n = 2 n j and m = Z . 

j k 

The raw sample proportions {nj/n} and are estimates of the popula- 

tion parameters { r j ] and {s^J respectively. However, rarely will the raw sample 
proportions satisfy the positivity or stability conditions mentioned earlier. 

Of course, they always satisfy the consistency and integrity conditions, and, 
when m and n are very large, the raw sample proportions may be acceptable estima- 
tes of the population parameters. 

Table 1 gives the raw sample frequencies of number-right scores for two 
parallel, 20-item mathematics tests given to random samples from a national popu- 
lation of examinees. 




^3* 



o 



Table 1 about here 



It is evident that test Y , with a mean of 11.6 (±.l) is about one raw score 
point easier than test X, which has a mean of 10.8 (i.l). In this example, the 
single zero in the Y- f reque nc ie s would prevent the raw sample proportions from 
satisfying the positivity condition. Table 2 shows the fitted frequencies and 
Freeman-Tukey residuals (Bishop, Fienberg and Holland, 1975) for log-linear 
models of the form 

L x 

log r ; = Q + £ 3 j.(x; ) * 

i = l 

and (29) 

L y 

log s k = a' + 2 3 £ ( y^c) 1 , 
i=i 

with Ly — 2 and Ly = 3 . The likelihood ratio chi-square statistic for the 
model for { r j ) is 18.35 on 18 degrees of freedom while that for {s^.} is 20.24 
on 17 degrees of freedom and these values suggest that, overall, the fits of 
these two models are quite good. To get a more detailed look at these fits we 
examine the Freeman-Tukey residuals in Table 2. These residuals should behave 
roughly like independent standard normal deviates if the model fits adequately. 
Since these residuals all lie within ± 2.0 and show no pattern we conclude that 

A A 

the fitted probabilities (i.e. rj and s^) from these models are improved esti- 
mates of the population score distributions in the sense of "consistency” and 
"stability" described earlier. 



Table 2 about here 
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Table 1 

Score Frequencies for Tests X and Y 
for Random Samples from the Same Population 



Score 


X-f requeue i es 


Y-f reque: 


0 


1 


0 


1 


3 


4 


2 


8 


11 


3 


25 


16 


4 


30 


18 


5 


64 


34 


6 


67 


63 


7 


95 


89 


8 


116 


87 


9 


124 


129 


10 


156 


124 


11 


147 


154 


12 


120 


125 


13 


129 


131 


14 


110 


109 


15 


86 


98 


16 


66 


89 


17 


51 


66 


18 


29 


54 


19 


15 


37 


20 


11 


17 


Total 


1453 


1455 


Mean 


10.8 


11.6 


Sd 


3.8 


3 . 9 
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Table 2 

Fitted Score Frequencies and Freeman-Tukey Residuals for Tests X and Y 
for Random Samples From the Same Populations 



Test X Test Y 

Score Fitted Frequenc ies* FT Residuals Fitted Frequencies** FT Residuals 



0 


3 . 


30 


-1 . 


4 


1 . 


71 


-1 . 


8 


1 


6. 


44 


-1 . 


4 


3 . 


77 


0. 


2 


2 


1 1 . 


77 


-1 . 


1 


7 . 


65 


1 . 


2 


3 


20. 


17 


1 . 


i 

j. 


14. 


24 


0. 


5 


4 


32. 


43 


- 0 . 


4 


24. 


44 


-1 . 


3 


5 


48. 


89 


2. 


0 


38. 


75 


-0, 


7 


6 


69. 


10 


-0 . 


2 


56. 


98 


0. 


8 


7 


91 . 


57 


0 . 


4 


77 . 


91 


1 . 


2 


8 


113. 


79 


0 . 


2 


99. 


35 


-1 . 


3 


9 


132. 


58 


-0 . 


7 


118. 


54 


1 . 


0 


10 


144, 


83 


0 . 


, 9 


132. 


, 72 


-0. 


8 


11 


148. 


36 


-0 . 


, 1 


139. 


87 


1 . 


, 2 


12 


142. 


, 49 


-1 , 


. 9 


139. 


, 15 


-1 , 


, 2 


13 


128. 


, 32 


0, 


. 1 


13 1. 


, 10 


0, 


. 0 


14 


108 , 


. 35 


0, 


. 2 


117, 


.31 


-0, 


. 8 


15 


85, 


. 79 


0 


. 1 


100, 


. 00 


-0, 


. 2 


16 


63 , 


. 69 


0 


. 3 


81, 


. 46 


0 


. 8 


17 


44, 


. 33 


1 


.0 


63 


. 60 


0 


. 3 


18 


28, 


. 93 


0 


. 1 


47 


. 73 


0 


, 9 


19 


17 


. 71 


-0 


, 6 


34 


. 54 


0 


. 5 


20 


10 


. 16 


0 


. 3 


24 


. 18 


-1 


. 5 



2-moment fit 
**3 -moment fit 
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3 . 2 The Anchor-Test Equating Design 

In Angoff's Design IV class of equating experiments, two independent random 
samples are drawn from two different populations, P and Q. Test X and an 
anchor-test, A, are given to the P-sample, while test Y and the anchor-test, A, 
is given to the Q-sample. Angoff's Design III is similar except that, in Design 
III, P and Q are the same population. 

In the anchor-test designs, when P and Q differ, there is a choice of po 'il- 
lation on which to do the equating. In general the synthetic populatio n, S, 
describes this choice of populations. Let w be a proportion, 0 < w < 1, then S 
may be denoted wP + ( l-w)Q and viewed as composed of two strata, P and Q, that 
are given relative weight w and 1-w, respectively. This means that probabili- 
ties for S are defined as weighted averages of corresponding P and Q probabili- 
ties. For example, Probg^X = xj ] is defined by: 

r j = P r o b g { X = xj ] = wProbp^X = xj } + (l-w)Probq{x = xj ] 

and (30) 

s k = Prob s { Y = y u ) = wProb P { Y = y k ] + (l-w)Prob Q {Y - y k ] . 

However, (30) shows the need to estimate probabilities for which there can 
be no data , i.e., Prob Q (x = xj } and Prob P (Y « y k }. This estimation must be 

accomplished by making assumptions that, in general, can not be tested. One 
such assumption, originally suggested by Tucker and discussed in Braun and 
Holland (1982) is the what we call the Conditional Homogeneity Assumption 
defined below: 

Condi tional Homogeneity Assumption : The conditional di st ribution of X g iven A 

(and of Y given A) i_s t he same ( i.e. i s homogeneous ) i_n P and Q , i.e., 



o 
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Probp{x = x j | A = a u } = Probg{x = x j | A - a u | 
and (31) 

Prob p {Y = y k { A = a u | = Prob Q {Y = y k | A = a u }. 

Note that when P = Q the conditional homogeneity assumption is automatically 
satisfied. 

We call the assumption "conditional homogeneity" because it asserts that 
the conditional distributions of X (and of Y) i" P and Q are homogeneous, i.e. 
the same in the two populations. 

The next theorem summarizes the use of this assumption in the estimation or 

calculation of Probgfx = Xj } and Probp{Y = y^ J . 

Theorem 4: Under t he Cond it ion a 1 Homogeneity As sump t ion 

Probg^X = x j ) a X Probp^X = x j | A « a u ] Probg^A = a u ] 
u 

and (32) 

Probp[Y = y^] = 2 Probq^Y = y^ | A = a u } Probp{A = a u }. 
u 

The proof of this result is straight-forward and omitted. 

In (32) we see that the right-hand sides of the equations involve only 
parameters (i.e. probabilities) that can, in principle, be estimated from the 
data collected in the design. When (32) is combined with (30), the probabilities 
[rj] and [s^] can all be estimated. The relevant equations on which this esti- 
mation is based are given below: 

rj - wPro bp { X = xj ) + (1-w) 2 Probp{x = x j | A - a u } ProbqfA = a u j, 

u 

and (33) 

s k - ( 1 -w)Profcq(Y = y k } + w I Probgf Y = v k | A - a u } ProbpfA - a u }. 

u 
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The raw data that arises in anchor-test designs consists of two sets of 

bivariate frequencies, i.e., the ( X , A ) - f requenc i e s from P, 

n j u = number of examinees with X = Xj and A = a u 

and the ( Y , A ) - f r eque nc ies from Q, 

m ku = num ^ er °f examinees with Y = y^, A = a u . 

The two sample sizes are given by 

n = 2 nj u and m = 2 m^ u . 

j , u k , u 



The raw sample frequencies could be used to estimate the various probabili- 
ties that go to make rj and s^ given in (33). However, rarely will these raw 
sample frequencies yield satisfactory estimates of all the probabilities involved 
except when m and n are very large. Tables 3 and 4 give bivariate frequencies 
for (X,A) and (Y,A) where X and Y are the same as in section 3.1 and A is a 20 
item anchor-test chat is parallel to X and Y. Note that in this example, P = Q 
so that the conditional homogeneity assumption is automatically satisfied. 



Tables 3 and 4 about here 



Let 



fpjul and Lkul 



Tables 5 and 6 give 
log-linear models of the 



be the population joint distribution given by 
Pj u - Probpfx = xj , A = a u } 

q ku = Probq{Y = y k , A = a u |. 
the fitted distributions that are obtained by 
form 



(34) 



fitting 
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Bivariate Score Distribution for Tests X and A 
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2 



2 



log (Pj u ) = a + 1 &i (Xj ) 1 + S Yj. (au) 1 + s Xj a u 




and 



(35) 



2 



2 



log (qit U ) “ + 2 $* ( y lc ) * + ^ *Vi (au) 1 + y\t a u • 



Tables 5 and 6 about here 



The likelihood ratio tests for adding extra terms to the models in (35) 



A 



were not significant. Table 7 gives the estimates of rj and s^ that follow from 
these smoothed distributions using (33), with w - .5. 

This is an example of an external anchor test. In Holland, King and Thayer 
(1988) the internal anchor test is also discussed and shown to be easily trans- 
fomed to the external anchor-test case. 



Table 7 about here 
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Fitted Bivariate Score Distribution for Tests X and A 
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Estimated Values for {rj} and {s^} Computed from 
and the Fitted Distributions in Tables 5 



Equation 
and 6 



( 33 ) 
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0.0C2 
0. 30- 
3.306 
9. 3 1 1 
0.3’ 3 
3,323 
0.339 
0.353 
0.367 
0.380 

Q 1 

0.097 
0.093 
0.093 
0.383 
0.073 
0.055 
0.342 
0.030 
0.023 
0.0 : 2 
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4. THE CONTINUIZATION STEP 

There are a variety of ways to select the cont inuizat ion constants h and 

A 

h . Perhaps the easiest is to always use specific fixed values such as 

h = h = 00 , which corresponds to linear equating, or h = h = .3, which we havp 
X Y X Y 

shown to correspond roughly to traditional equ ipercen t i le equating. Rather than 

always using fixed choices of h^. and n^ , ve suggest a flexible approach toward 

the choice of c ont inu i za t ion constants, remembering that various goals may need 

to be achieved in selecting a satisfactory equating function. 

Our approach is to choose hy so that (x) is close to F(x) in some sense. 

X 

Some care needs to be exercised in selecting a notion of closeness. For 
example, if the sup norm, i.e,, 

sup | F h (x) - F(x) | ( 36 ) 

x X 



is used to measure how close Fk (x) is to F(x), then this is minimized for 

X 

h^ = 0 and the result is useless. 

The density of Fu (x), i.e. Fu (x) , can be used to clarify what we want in 
X X 

a "good" continuous approximation to F(x). Consider Figure 1. It is the den- 
sity that arises when h .3 in the example of section 3.1. It exhibits a 

A 

" s t egosaur ian M character that would appear, on its face, to be undesirable. 

When h = 1.0, the result is Figure 2. Evidently, h has a big influence on t hr 
A A 

shape of the continuous approximation for F(x). 



When the xj are consecutive integers, we can use the density, Fj^ (x), t o 

create a histogram that we can then compare to the { r j ) . This is done in the 

following way. Imagine a histogram centered on the fxj] with heights (xj)j 

and unit width. If h is chosen appropriately this histogram will be close to 

A 

the unit width histogram on the X; with heights {r;j. To choose h we can 

J X 



Figures 1 and 2 about here 
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Graph of the Density F^(x) for h^, - 
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Graph of the Density Fw, (x) for h,, - i.O 

X x 



DERV 




X 





33 



minimize the "squared difference" criterion. 

s (Jj - Ff. <Xj)>’ ' < 37 > 

j X 

The minimizing values of and for the example of section 3.1 are .62 and 
.57 respectively. 

In the case of anchor test equating, i.e. section 3.2, the same 
considerations arise but are applied to { r j ) and from (33). Using the 

estimates of rj and in Table 7, the optimal values of hy and hy that minimize 
the squared difference criterion are .62 and .59, respectively. 

The cont inuization step can be used to remove the need for a final 
" post srr.oot h ing" of the equating function (Fairbank ( 1985), Kolen (1984), Kolen 
and Jarjoura (1987)). The reason po st smoot hi ng arises is that if the continuous 
approximations to F and G are not smooth enough, the equating functions computed 
via (11) will exhibit unreasonable oscillations about an otherwise smooth trend. 
Post smoot hing eliminates these oscillations. One situation that can produce 
these oscillations arises when tests are formula-scored. In formula-scored 
tests with few omitted responses the raw-score distribution will often produce 
"gaps" at specific scores. Figure 3 illustrates this phenomenon. When 
smoothing frequencies that exhibit gaps one has the choice of whether or not the 
smoothed frequencies ought to have "gaps" in them. Figure 4 shows a fitted 
distribution to the data in Figure 3 that has gaps. It was achieved by fitting 
moments to the "gap" scores as well as to all the scores using the techniques 
discussed in Holland and Thayer (1987). If a distribution that had no gaps had 
tor.j; fir to t he* 5 ; o data, the fit would have been poor according to the usual 
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Figure 3 

A Raw-score Distribution for a Formula-scored Test 
That Exhibits "Gaps" at Regular Intervals on the Score Scale 
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Figure 4 

A Model With "Gaps" Fitted to the Data in Figure 3 
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goodness-of-f it statistics and it would have been unclear how to choose a satis- 
factory model. When data with gaps are encountered in test equating we recom- 
mend that the gaps be accounted for in the estimation step, i.e. by fitting a 
model like the one in Figure 4. The reason is that standard goodne s s-of - f i t 
tests then provide a rational basis for choosing a model, and the resulting 
estimated standard errors for the fitted model (used to compute the standard 
error of equating) can be expected to be approximately correct. In the con- 
tinuization step, the gaps can then be removed by taking h^ large enough. 

Figures 5 and 6 show the approximating densities for the fitted model in Figure 
4 for h^ = 1 and 3, respectively. When h^ = 1 there are still some remnants of 
the gaps left but by h^ * 3 they are gone and the undesirable oscillations have 
been smoothed out. Figure 7 shows the fitted probabilities from Figure 4 and 
the continuous density for h^ - 3 from Figure 6. The density shows the general 
shape of the fitted probabilities but the gaps have been filled in. 

We recommend that gaps be preserved in the estimation step and then removed 
in the con t inu izat i on step in order to insure the accuracy of the standard error 
of equating that is discussed extensively in the companion paper, Holland, King 
and Thayer ( 1 988 ) . 
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Figure 7 

Graph of the Density for - 3.0 and the Fitted Probabilities 
Showing How the Gaps Have Been Filled In 
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5. THE EQUATING STEP 

A A 

Once continuous approximations to F(x) and G(y) are in hand, it is a 
relatively straightforward process to compute the equating functions via (23) 
and (24). The only computational issue is the accuracy with which the inverse 
functions F^(p) and G hy^P) need to be approximated. We have not investigated 
this carefully but have found that for the cases we have considered a grid of 
width .05 has proved sufficient. 

In the examples of sections 3.1 and 3.2 the equating functions are very 
nearly linear. Figure 8 shows the difference between the graphs of the linear 
equating function ( h x = hy = °°) and the approximate equ ipe rcent i le equating 
function ( h x = hy » .3) for equating Y to X for the example in section 3.1. 

While there are some differences between these equating functions they are quite 
small in this example. Figure 9 shows three equating functions for simulated data 
in which there is a great deal of curvil inearity when h^ - hy = .3. The 
equating functions for h = h = 5 and h = h = 10 are also shown to illustrate 

Ax A I 

that as the h l s increase the equating functions become more linear. 

Once h^ and h are selected, F^(x) and G^Cy) are determined as functions 
of the estimated score probabilities {rj} and {s^}. The computation of the 
standard error of equating (SEE) can then proceed by a straight forward, but 
tedious, application of the S-method of computing asymptotic variances of func- 
tions of random quantities -- in this case the random quantities are { r j } and 

A 

{s^}. This is the approach described in detail in our companion paper, Holland, 
King and Thayer (1988). 
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Figures 8 and 9 about here 



DIFFERENCE 



41 

Figure 8 

The Difference Between the Linear and the Approximate Equipercentile 
Equating Functions, for the Example of Section 3.1 
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Three Equating Functions for Simulated Data 
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6. DISCUSSION 

We believe that the kernel method of equating, when coupled with estimated 
score distributions using log-linear models, has a number of advantages over 
other observed-score equating methods. 

First of all, the three phases, estimation, cont inu izat ion and equating, 
form a unified approach to many problems that arise in equating. Most of the 
difficulties in equating arise in the estimation and cont inu i za t ion phases and 
these are quire different and ought to be treated separately. The problem of 
devising equating diagnostics is fairly easy once this separation is made. Some 
diagnostics will concern the estimation phase (i.e. t.he adequacy of model fit) 
while others concern the choice of cont i nu izat ion constant (e.g. the treatment 
of the "gaps" in formula score distributions). 

Because log-linear models are very flexible they provide useful models for 
both large and small samples. Hence their use with the kernel method eliminates 
many of the problems that arise in equating with small samples of examinees. At 
the same time, large samples can also be fit adequately using these models. 

The kernel method essentially contains linear and traditional equipercen- 
tile methods as special cases and can therefore exploit the best features of both 
methods. Furthermore, because it can handle both random groups and common ite; 
designs, the use of log-linear models in the latter case provides a substan- 
tially improved version of the method called "frequency estimation” (as called 
for in Braun and Holland, 1982). 




; > J 



44 



The kernel method does not force the high and low score on the two tests to 
match as traditional equ ipe rcent i le (and IRT true-score) methods do. It also 
does not restrict the equating function to be defined for only those raw score 
values that occur on the test. This can be very important for the chains of 
equatings that build up as a long sequence of new test forms is built up. In 
addition, because F^ and are given by analytic formulas it is unecessary to 

specify the equating function by a table as most equ ipe r c e nt i 1 e methods do. 
Instead, if hy , hy and the estimated pr obabi 1 1 i t ies {rj} and are kept, F^, 

anc * the ec 3 uat i- n 8 functions can be computed anew and chained together when- 
ever they are needed. Although this is more complicated than carrying equating 
chains through by linear equating, it is still more satisfactory than the ad hoc 
tables of traditional equ ipercent i le equating. 

Finally, computationally efficient methods of estimating the standard error 
of equating are available and, for the first time, honest SEEs can be provided 
for a wide variety of equating designs. These SEEs reflect both th ; shape of 
the equating function, the design of the equating experiment, and the method 
used to pre-smooth the data in the estimation phase of the equating process. 

In view of these advantages we see the kernel method of equating as a 
complete equating package that can provide measurement statisticians with a 
powerful set of tools for solving practical everyday problems in equating. 
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Future research in this area might explore a range of topics such as these. 

1) Are there methods for choosing h x and hy that are better than the 
minimization of the squared difference criterion, (37)? 

2) What is the effect of data dependent choices of h x and hy on the SEE? 

3) Are the SEEs found by the 6-method good enough or are higher-order 
methods needed? 

4) What is the relation between the kernel method and IRT or linear true- 
score equating methods? 

5) What role can the kernel method play in the assessment of the invariance 
of equating functions across different populations of examinees? 
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APPENDIX: Proof of Theorem 2. 

Let the moment generating function (rngf) of X be M x (t). It is well-known 
that the Taylor expansion of log[M x (t)] is given by 



1 og[ M x ( t ) ] = M x t + O x t 2 /2 + 2 kj X (t)J/j! • 

J -3 



But the mgf of X(h x ) is given by 

E[ exp{ tX(h x ) } ] = 

e[ exp( t (a x (X+h x V) + (l-a x ) |J X )}] 

= exp{t(l-a x )|J x }E[exp[ta x X + ta x h x v] ] 

Bur. since X and V are independent 

e[ exp{ ta x X + ta x h x v} ] 

« E[exp{ta x x] ]E[exp{ta x h x vj ] 

= M x (ta x ) My(ta x h x ) 



(38) 



. mf . r o M y and My are the mgfs of X and V respectively. 



But, it is well-known 



t liar 



M v (t ) - exp {'4 t 7 } , 

no that the rr.gf of X(hx? 'tan be expressed as 

e[ exp{ tX(h x ) ] ] 

= exp{ t ( 1 -ax>Mxl Mx( ta X' ) exp{'At J a x h x } . 
Mow take logs to got the eumulants, i.e. , 

log e[ exp.f tx( h x ) } ] 

- t ( 1 - a x ; JJ’X 4 l/, ‘ t'ajhi 4 log[Mx( t a x ) ] . 



(39) 



*h,w -.'T'lbine (38) and ( 3 9 ) to get 
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log e[ exp{ tX(h x ) } ] ■ 

2 2 2 2 

((l-a x )y x + a x y x )t + (a x h x + 0 x a x )t z /2 

+ 2 k jx (a x )j(t)J'/j 
j 

But (l-a x )y x + a x y x = y x and a x h x + 0 x a x - a x (h x + 0 X ) - 0 X , so we obtain 



log e[ exp{ tX(h x ) } ] - 

y x t + 0 X t 2 / 2 + Z (a x )J*kj X (t)j /j ! . 
j —3 



But the coefficients of a Taylor expansion are unique so the cumulants of X(h x ) 
are (a x )Jkj X , QED . 
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